Vision Language Models

VLMs enable businesses to uncover richer insights and tackle complex use cases that text-based retrieval alone cannot solve.

The next wave in enterprise AI

By extending the power of retrieval-augmented generation (RAG) beyond text to include visual data—such as images, charts, and PDFs—VLMs enable businesses to uncover richer insights and tackle complex use cases that text-based retrieval alone cannot solve.

Empowering RAG systems

VLMs empower RAG systems to harness a PDF’s text and visual elements, unlocking a richer and more comprehensive understanding of the document. With VLMs and Vespa.ai, enterprises can build a complete solution for AI-powered intelligent document interaction while taking advantage of Vespa’s hybrid search capabilities and scalable architecture.

Tackle complex use cases

With the integration of vision language models (VLMs), businesses can go beyond traditional text retrieval to seamlessly interpret and analyze images, charts, and PDFs. This enables solutions for intricate scenarios, such as multimodal data insights, decision-making in high-dimensional contexts, and enhanced customer personalization.

Insurance

By leveraging Vespa’s Visual RAG capabilities, insurance companies can detect fraudulent claims by analyzing repeated image patterns across thousands of documents. This significantly accelerates claim processing, empowers claim representatives with deeper insights, and can save millions of dollars annually. Use cases include:

  • Fraud Detection: Identify fraudulent claims by detecting reused or manipulated images across multiple documents.
  • Signature Verification: Locate all documents signed by a specific individual to verify authenticity and prevent forgery.
  • Policy Analysis: Extract and analyze tables and charts within policy documents to ensure compliance and accuracy.

Healthcare

Healthcare institutions can use Vespa´s Visual RAG to process vast archives of patient data. A physician searching for cases involving a rare heart condition can instantly retrieve records that combine imaging results, chart patterns, and written notes—all within milliseconds. This capability not only saves valuable time but also enables faster, more accurate diagnoses, ultimately improving patient outcomes and advancing personalized care.:

  • Medical Record Retrieval: Quickly find patient records containing specific visual elements, such as particular imaging results or chart patterns.
  • Clinical Trial Matching: Identify documents with specific text and visual data to match patients to appropriate clinical trials.
  • Diagnostic Image Comparison: Compare medical images across records to track disease progression or treatment efficacy.

E-commerce

A fashion retailer transforms its user experience by enabling customers to search for outfits using photos. A customer snaps a picture of a dress they like; Vespa’s Visual RAG instantly retrieves similar items from the catalog, complete with size and color recommendations. Key use cases include:

  • Product Image Matching: Enable customers to search for products using images, enhancing the shopping experience.
  • Counterfeit Detection: Identify counterfeit products by comparing images in listings to authentic product images.
  • Visual Inventory Management: Analyze product images to manage inventory and detect discrepancies.
  • Visual Product Grouping: Group products based on images.

Financial services

Investment analysts saves hours of manual work with VLMs and Vespa. Visual RAG extracts key data from complex charts and tables in financial reports, converting them into actionable insights instantly. This efficiency gives firms a competitive edge in fast-paced markets. Key use cases include:

  • Document Verification: Authenticate financial documents by analyzing signatures and visual elements.
  • Data Extraction: Extract and interpret data from tables and charts in financial reports for analysis.
  • Compliance Monitoring: Ensure documents adhere to regulatory standards by analyzing visual content.
  • Fraud Detection: Identify discrepancies and potential fraudulent activities by scrutinizing financial patterns and anomalies.
  • Money Laundering Protection: Detects and prevents money laundering activities by analyzing transaction flows and visual clues in documentation.
  • Multinational Lending Verification: Verify and assess multinational lending documents to ensure accuracy and compliance across borders.

By bridging the gap between text and visual data, VLMs and Vespa.ai redefine what’s possible in AI-driven applications.

Vespa at a Glance

Fully Integrated Platform

Vespa delivers all the building blocks of an AI application, including vector database, hybrid search, retrieval augmented generation (RAG), natural language processing (NLP), machine learning, and support for large language models (LLM).

Integrate all Data Sources

Build AI applications that meet your requirements precisely. Seamlessly integrate your operational systems and databases using Vespa’s APIs and SDKs, ensuring efficient integration without redundant data duplication.

Search Accuracy

Achieve precise, relevant results using Vespa’s hybrid search capabilities, which combine multiple data types—vectors, text, structured, and unstructured data. Machine learning algorithms rank and score results to ensure they meet user intent and maximize relevance.

Natural Language Processing

Enhance content analysis with NLP through advanced text retrieval, vector search with embeddings and integration with custom or pre-trained machine learning models. Vespa enables efficient semantic search, allowing users to match queries to documents based on meaning rather than just keywords.

Visual Search

Search and retrieve data using detailed contextual clues that combine images and text. By enhancing the cross-referencing of posts, images, and descriptions, Vespa makes retrieval more intelligent and visually intuitive, transforming search into a seamless, human-like experience.

Fully Managed Service

Ensure seamless user experience and reduce management costs with Vespa Cloud. Applications dynamically adjust to fluctuating loads, optimizing performance and cost to eliminate the need for over-provisioning.

High Performance at Scale

Deliver instant results through Vespa’s distributed architecture, efficient query processing, and advanced data management. With optimized low-latency query execution, real-time data updates, and sophisticated ranking algorithms, Vespa actions data with AI across the enterprise.

Always On

Deliver services without interruption with Vespa’s high availability and fault-tolerant architecture, which distributes data, queries, and machine learning models across multiple nodes.

Secure and Governed

Bring computation to the data distributed across multiple nodes. Vespa reduces network bandwidth costs, minimizes latency from data transfers, and ensures your AI applications comply with existing data residency and security policies. All internal communications between nodes are secured with mutual authentication and encryption, and data is further protected through encryption at rest.

Predictable Low-Cost Pricing

Avoid catastrophic run-time costs with Vespa’s highly efficient and controlled resource consumption architecture. Pricing is transparent and usage-based.

The Challenge: Scaling AI for Visual Data

While proof-of-concept projects for AI often deliver promising results in controlled environments, deploying such solutions at scale introduces significant challenges. Enterprises must address:

  1. Performance at Scale: Maintaining high-speed, accurate  results while performing searches on millions of documents is not a trivial task particularly when high query rates at low latency are required.
  2. Managing Costs with scale: Organizations often find themselves compromising on accuracy, performance, or scalability due to budget constraints.
  3. Modern model support: Extensive support for a variety of models, including emerging technology like ColPali, is crucial as AI continues to evolve. Many systems struggle with limited compatibility, hindering their ability to keep pace with technological advances.
  4. Integration Complexity: Connecting AI with diverse data sources across various formats.
  5. Data Privacy and Security: Ensuring sensitive data remains protected during AI processing
  6. Unstructured Data Processing: Extracting meaningful insights from visual elements, such as images or charts, alongside text.

For large enterprises and government entities, managing vast amounts of documents presents unique challenges. These organizations require solutions that not only scale efficiently but also comply with stringent regulatory standards and operate within secure environments. They also face the arduous task of dealing with a plethora of legacy documents, which often require manual, imprecise steps for digitalization, leading to unsuccessful and slow processes using outdated OCR methods. Implementing AI-driven visual data solutions in these settings enhances accessibility, expedites decision-making, improves compliance with legal and operational standards, and transforms the efficiency of digitizing and processing historical documents.

Scaling AI to handle these challenges requires not just robust technology but a platform designed to deliver enterprise-grade performance and scalability.

Why Visual RAG?

Visual RAG builds on traditional RAG by introducing the ability to search and retrieve information from both text and visuals. This capability opens up new opportunities for enterprises to:

  • Identify visual patterns from submitted images for use cases like insurance claim analysis.
  • Extract insights from charts, tables, and scanned documents, transforming static visuals into actionable data.
  • Enhance multimodal search by combining image and text queries for more comprehensive results.

For example, an insurance company could analyze images of car damage alongside textual descriptions to refine risk scoring, or a retailer could offer search capabilities that combine product images with detailed specification

Vespa: Proven at Scale

Vespa has been tackling the challenges of large-scale AI applications since 2011—long before AI went mainstream. Originally built to meet Yahoo’s vast data needs, Vespa today powers 150 critical applications across Yahoo, delivering personalized content and managing targeted ads in one of the world’s largest ad exchanges. These systems serve nearly one billion users and handle 800,000 queries per second.

By combining Vespa’s battle-tested infrastructure with modern generative AI techniques, this demo showcases how Visual RAG can be applied to solve real-world enterprise challenges at scale.

– Proven Performance: Battle-tested at massive scale with 800,000 queries/second.

– Cost-Effective Scaling: Optimized for performance while reducing operational costs.

– Future-Ready: Extensible for next-gen AI applications.

Vespa Support for Visual RAG

Vespa provides the foundational technology needed to power Visual RAG:

  • Colocated data and compute resources: Vespa’s core principle is to deliver the fastest possible response. This is achievable by keeping data close to compute. No need to shift big data volumes through the network.
  • Extensibility:  Vespa enables users to adapt to new use cases and technology. Enabling support for late interaction models was a minimal engineering effort. It is ready for new things to come.
  • Scalable Retrieval: Query massive volumes of unstructured and structured data in real time, without performance bottlenecks. Vespa has showcased multiple optimisation methods to clash operational cost. 
  • Multimodal Capabilities: Process and combine text, images, and tables in a single AI workflow. Vespa enables the usage of the most recent late interaction VLM models. 
  • Seamless Integration: Connect to diverse enterprise data sources securely and efficiently. Vespa provides a robust API that is easy to adapt. 
  • Customizable AI Pipelines: Tailor applications to specific business needs, whether risk assessment, visual search, or multimodal recommendations. Vespa is an AI Application Platform that enables Search, Recommendation and any other custom applications. 

With Vespa, enterprises go beyond experimentation and bring Visual RAG to production, solving real-world problems and unlocking new possibilities for data-driven innovation.

Explore more

Building a Visual RAG Demo

How We Built This Visual RAG demo (Blog post)

Scaling ColPali to Billions (Blog Post)

To help organizations navigate their choice in RAG adoption, BARC has prepared the research note: Why and How Retrieval-Augmented Generation Improves GenAI Outcomes. Download your free copy here.

Vespa RAG White Paper

Learn more about Vespa technical details for RAG in this white paper.

Vespa at Work

“RavenPack has trusted Vespa.ai open source for over five years–no other RAG platform performs at the scale we need to support our users. Following rapid business expansion, we transitioned to Vespa Cloud. This simplifies our infrastructure and gives us access to expert guidance from Vespa engineers on billion-scale vector deployment. This move allows us to concentrate on delivering innovative solutions to meet our users’ increasingly sophisticated demands.”

“We chose Vespa because of its richness of features, the amazing team behind it, and their commitment to staying up to date on every innovation in the search and NLP space. We look forward to the exciting features that the Vespa team is building and are excited to finalize our own migration to Vespa Cloud.” Yuhong Sun, CoFounder/CoCEO DanswerAI.

Perplexity.ai leverages Vespa Cloud as its web search backend, utilizing a hybrid approach that combines multi-vector and text search. Vespa supports advanced multi-phase ranking, ensuring more accurate and relevant search results.